Enlisting the Ghost: Modeling Empty Categories for Machine Translation

نویسندگان

  • Bing Xiang
  • Xiaoqiang Luo
  • Bowen Zhou
چکیده

Empty categories (EC) are artificial elements in Penn Treebanks motivated by the government-binding (GB) theory to explain certain language phenomena such as pro-drop. ECs are ubiquitous in languages like Chinese, but they are tacitly ignored in most machine translation (MT) work because of their elusive nature. In this paper we present a comprehensive treatment of ECs by first recovering them with a structured MaxEnt model with a rich set of syntactic and lexical features, and then incorporating the predicted ECs into a Chinese-to-English machine translation task through multiple approaches, including the extraction of EC-specific sparse features. We show that the recovered empty categories not only improve the word alignment quality, but also lead to significant improvements in a large-scale state-of-the-art syntactic MT system.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Effects of Empty Categories on Machine Translation

We examine effects that empty categories have on machine translation. Empty categories are elements in parse trees that lack corresponding overt surface forms (words) such as dropped pronouns and markers for control constructions. We start by training machine translation systems with manually inserted empty elements. We find that inclusion of some empty categories in training data improves the ...

متن کامل

Integrating empty category detection into preordering Machine Translation

We propose a method for integrating Japanese empty category detection into the preordering process of Japanese-to-English statistical machine translation. First, we apply machine-learningbased empty category detection to estimate the position and the type of empty categories in the constituent tree of the source sentence. Then, we apply discriminative preordering to the augmented constituent tr...

متن کامل

Democracy – The Real ‘Ghost’ in the Machine of Global Health Policy; Comment on “A Ghost in the Machine? Politics in Global Health Policy”

Politics is not the ghost in the machine of global health policy. Conceptually, it makes little sense to argue otherwise, while history is replete with examples of individuals and movements engaging politically in global health policy. Were one looking for ghosts, a more likely candidate would be democracy, which is currently under attack by a new global health technocracy. Civil society moveme...

متن کامل

Chasing the ghost: recovering empty categories in the Chinese Treebank

Empty categories represent an important source of information in syntactic parses annotated in the generative linguistic tradition, but empty category recovery has only started to receive serious attention until very recently, after substantial progress in statistical parsing. This paper describes a unified framework in recovering empty categories in the Chinese Treebank. Our results show that ...

متن کامل

Ghost Image Mapping of Palatal Bone of Maxilla and Nasal Cavity in Panoramic View Using Cranex D Digital Machine

Introdouction: The mapping of ghost images of the maxilla and the nasal cavity, which are complex structures, is very important. The position of objects that create a ghost image can differ when using various devices. The purpose of this investigation was to study the mapping of ghost images of the maxilla and the nasal cavity using a Cranex D digital panoramic machine. Materials and methods: ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013